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Abstract —For the purpose of propagating information and ideas through a sociai network, a seeding strategy aims to find a smaii set 
of seed users that are abie to maximize the spread of the influence, which is termed as infiuence maximization probiem. Despite a 
iarge number of works have studied this probiem, the existing seeding strategies are iimited to the static sociai networks, in fact, due to 
the high speed data transmission and the iarge popuiation of participants, the diffusion processes in reai-worid sociai networks have 
many aspects of uncertainness. Unfortunateiy, as shown in the experiments, in such cases the state-of-art seeding strategies are 
pessimistic as they faiis to trace the dynamic changes in a sociai network, in this paper, we study the strategies seiecting seed users in 
an adaptive manner. We first formaiiy modei the Dynamic Independent Cascade model and introduce the concept of adaptive seeding 
strategy. Then based on the proposed model, we show that a simple greedy adaptive seeding strategy finds an effective solution with a 
provable performance guarantee. Besides the greedy algorithm an efficient heuristic algorithm is provided in order to meet practical 
requirements. Extensive experiments have been performed on both the real-world networks and synthetic power-law networks. The 
results herein demonstrate the superiority of the adaptive seeding strategies to other standard methods. 

Index Terms —Social network influence, adaptive seeding strategy, stochastic submodular maximization. 
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1 Introduction 

W ITH the advance of information science in the last two 
decades, social networks are becoming important 
dissemination platforms as they allow efficient interchange 
of ideas and information. The influence diffusion process in 
social networks has been studied in many domains e.g. epi¬ 
demiology, social median and economics. It has been shown 
that the investigation into the influence diffusion are of great 
use in many aspects such as designing marketing strategy 
Q, l^, analyzing human behavior Q and rumor blocking 
|4). In order to formulate the diffusion process, a number 
of models have been studied during the last decade. Two 
basic operational models, linear threshold (LT) model and 
independent cascade (IC) model, are proposed by Kempe et 
al. (5|. In the Linear Threshold Model, a user will adopt a 
new idea if the influence from its neighbors has reached a 
certain threshold, while in the Independent Cascade Model 
an adopter has a certain probability to convince each of its 
neighbors. Based on those two models various models have 
been developed and studied. 

In the topic of influence diffusion, an important issue 
is that how to propagate information through a social net¬ 
work effectively and efficiently. As an example, in order to 
advertise new products, a company would like to offer free 
samples to a set of initial users who will potentially intro¬ 
duce the new product to their friends. Due to the expense 
issue, only a limited number of samples are available and 
thus we have a budget of the seed users. A natural problem 
is that how to select a good set of seed users that is able 
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to maximize the number of customers who finally adopt 
the target product. This problem is named as influence 
maximization problem first proposed in in literature. 

A large body of related works have been performed con¬ 
cerning the influence maximization problem, but the state- 
of-art technique may not deal with many real cases in effect. 
A drawback of the existing diffusion models is that they fail 
to take account of some uncertain natures of a real-world 
social network. Such uncertainness can be viewed from the 
following three aspects. In a real-world social network, the 
seed users are not assured to be successfully activated. In 
the example of selling a new product, the advertising would 
be stuck if the free samples do not satisfy the initial users. 
Second, the information is not guaranteed to be delivered 
from one user to the other and thus the diffusion itself 
is a probabilistic process. Furthermore, the topology of a 
social network is not always static in real cases due to 
the frequent variation of the degree of the relationship 
between users. In the sense of an online social network, 
such as Facebook, Twitter or Flicker, topology changes are 
incurred by the increasing number of the common friends 
between a pair of users. In this paper, we study the influence 
maximization problem in the social networks with the above 
characteristics. By extending the classic IC model, we herein 
develop the Dynamic Independent Cascade (DIC) model 
which is able to capture the dynamic aspects of real social 
networks. In the classic IC model a seed node is guaranteed 
to be activated after selected and the relationship between 
two users is simply represented by a fixed probability, while 
the seed nodes in our DIC model could fail to be activated 
with a certain probability and the propagation probability 
between two users follows a certain distribution which 
reflects the change of topology of a social network. 

Based on the DIC model, we further consider how to 
design a seeding strategy to find effective seed nodes. For 
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the classic IC model, Kempe et al. ||^ propose a simple 
greedy algorithm with an approximation ratio of (1 — 1/e) 
and Chen et al. 0 present an efficient heuristic seeding 
approach to handle large-scale social networks. The exist¬ 
ing approaches always make seeding selection in a static 
manner (i.e., determining a seed set before the process of 
spread), which renders them inapplicable to the DlC model. 
As mentioned earlier the seed users in the DlC model are 
not guaranteed to be activated. In this setting, an arising 
issue is that we can seed a user for more than one time if it 
is not successfully activated in the past rounds. One can see 
that it is worthy to take more effort to activate a powerful 
user as he or she may generate considerable influence to a 
social network. However, a static seeding algorithm cannot 
take such a case into account. Besides, to determine a seed 
set, the prior algorithms require the propagation probability 
between users, but in the DIC model such a probability is 
a random variable and we can only expect a distribution 
over it. Admittedly, we could take advantage of its expected 
value and then apply the prior approach. But such a method 
would be pessimistic as it fails to trace the dynamic topology 
of a real-world social network. In this paper, we first provide 
a simple adaptive seeding strategy that is able to handle 
the dynamic aspects of real-world social networks, and then 
design a heuristic algorithm for better scalability. 

1.1 Related Work and Technique 

Domingos et al. jbj are among the first who study the 
influential nodes in viral marketing. In the seminal work 0, 
Kempe et al. formulate the influence maximization problem 
from the view of combinatorial optimization, and provide a 
greedy algorithm with an approximation ratio of (1 — 1/e). 
Efficient heuristic influence maximization algorithms have 
been studied in many works 0, |8| and 0. Long et at. 
pB| further study this problem from the perspective of 
minimization. Du et al. pT| and Rodriguez et al. ||^| propose 
the continuous diffusion model and study the influence 
maximization problem in this setting. All the above works 
aim to determine an effective seed set before the diffusion 
process and focus on the network with a static topology. 

In order to learn a provable performance guarantee, 
submodular functions play an important role in the prior 
works. Kempe et al. 0 show that the expected number 
of active nodes is a monotonically increasing submodular 
function over the seed set, and therefore, by the celebrated 
result in [ [T^ |, a simple greedy algorithm yields an (1 — 1/e) 
approximation. However, as shown later in Sec. such a 
technique cannot be directly applied to the adaptive seeding 
problem. On the one hand the seed nodes are unknown 
before the diffusion process as they are adaptively selected; 
on the other hand the value of the objective function over a 
certain seed set cannot be explicitly observed. 

Adaptive seeding strategy is a stochastic optimization 
framework and a natural extend to original seeding ap¬ 
proach in 0. Part of the analysis in this paper is based 
on the stochastic submodular maximization. Asadpour et 
al. ITS) present the analysis of the stochastic submodular 
maximization problem where the objective function is de¬ 
fined on the power set of a set of independent random 
variables. Golovin et al. pb) further study this problem with 
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the concept of adaptive submodularity. Although the above 
works are only applicable to special cases of the adaptive 
influence maximization problem, they provide a clue that 
the greedy algorithm in its adaptive version is still able 
to achieve a provable performance guarantee. In a recent 
work, Seeman et al. |T^ consider the adaptive approach to 
a variant influence maximization problem where the seed 
nodes are constrained in a certain set and the influence can 
spread for only one round, and thus has a different setting 
from that of this paper. 

1.2 Contribution 

The contributions of this paper are summarized as follows. 
We propose the DlC model that is able to capture the 
dynamic aspects of real-world social networks. In order to 
provide a formal description of an adaptive seeding strategy 
we introduce the concept of seeding pattern. The main con¬ 
tribution of this paper is an adaptive hill-climbing strategy 
with a provable performance guarantee in the DlC model. 
We further design an efficient heuristic adaptive seeding 
strategy by narrowing the candidate seed sets before the 
seeding process. The conducted experiments demonstrate 
the superiority of the proposed adaptive seeding strategies 
to the original seeding approaches in dynamic social net¬ 
works. 

The rest of the paper is organized as follow. The pro¬ 
posed DlC model and the adaptive seeding strategy are 
formulated in Sec. The analysis of the greedy adaptive 
strategy is shown in Sec. [3] a nd the heuristic strategy is 
proposed in Sec. In Sec. 0 we show the experimental 
results. Sec.|6]concludes. 

2 Problem setting 
2.1 Die Model 

A social network is modeled as a directed graph where 
nodes and edges denote the individuals and social ties, 
respectively. In order to spread an idea or advertising a new 
product in a social network, some seed nodes are chosen 
to be activated (e.g., by giving payments or offering free 
samples) to trigger the spread of influence. Following the 
notations in jSj we speak of each node as being either active 
or inactive. A node can be activated either by its neighbor or 
as a seed node. 

In the DlC model, associated with each node u there is 
a random variable Xu following a Bernoulli distribution /„, 
where = 1 indicates node u is successfully activated as 
a seed node. For the relationship between nodes, an active 
node u has one chance to activate its inactive neighbor v via 
edge {u, v) with a probability of X(^u,v) which is a random 
variable. With the activated seed nodes diffusion process 
goes round by round. Without the loss of generality, for each 
edge e, we assume Xg follows a certain discrete distribution 
/e with a domain De, and let d\ G [0,1] be the value in 
He. In this paper, we do not enforce any specific distribution 
of Xe hr the DIC model, for an edge e = {u,v), the value 
of Xe remains unknown until one of the neighbors of u 

1. We may assume an exponential distribution as a social network 
always exhibits a power-law pattern where the influential users are 
rare |17|. 
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Symbol 

Definition 

G 

Instance of DIC network. 

Gi 

Example DlC network in Example 


B 

Budget of seed set. 

De 

Domain of the propagation probability 
of edge e. 

dl 

The value in De- 

Prob[A:„ = 1] 

The probability that Xu can be acti¬ 
vated as a seed node when selected. 

A 

Seed pattern. 

^0 

Special seed pattern define in Def. C 


A* 

Special seed pattern define in Def. ^ 



Seeding strategy of pattern A on G 

oPT<y 

Optimal seeding strategy of pattern A 
on G 

c-G 

Auxiliary graph of network G 

X 

Full realization 

y 

Partial realization 

e 

Empty realization 


TABLE 1: Notations. 


is active. This is because in practice an industry institute 
may only trace the interested influence and the real-time 
state of the rest of the network is unavailable. We denote 
an instance of DlC network by G = (V, E, Fv,Fe), where 
Fy = {fu\u G V} and Fe = {/e|e S E} are the sets of 
the distributions of Xu and X^, respectively. Let N be the 
number of the nodes in V. Due to the expense of activating 
seed nodes, we have a budget B{B < N) for the seed set. 
The notations that are frequently used later in this paper are 
listed in Tableland the rest of the notations in Table |T] will 
be introduced later. 

2.2 Adaptive Seeding Strategy 

Basically, to design an adaptive seeding strategy we con¬ 
sider two problems: ( 1 ) how many budgets should we use in 
each seeding step and (2) which nodes to select. We employ 
the following concepts to formulate those problems. 

Assuming that the seed nodes are only selected between 
two spread rounds, we denote the seeding step between 
round i—1 and round i as the seeding step, and the first 
seeding step is executed before the process of spread. We 
assume that we need one round to activate the seed nodes 
selected in each seeding step. In this paper, we consistently 
use "step" for seeding process and "round" for diffusion 
process. 

Definition 1. A seeding pattern A = (ai,...,ajv) is a se¬ 
quence of non-negative integers, implying that we seed 
Oi nodes in the seeding step. We will later show that 
we have at most N seeding steps. Due to the budget con¬ 
straint, X! ^ Note that it reduces to the non-adaptive 
seeding if A = {B). Corresponding to a seeding pattern 
A = (tti,..., Oat), a seeding strategy Sa = (si,..., s^v) of A is 
a sequence of node-sets where | | = and Si is the node¬ 
set seeded in the f‘^seeding step. That = 0 implies that 
we do not seed any node in the i‘^seeding step and thus 

Si = 0 . 


In the above setting, both the seeding pattern and seed¬ 
ing strategy can be adaptively constructed, i.e., and Si 
may depend on the outcomes of the past rounds. For a 
specific DlC network G, we use to denote a seeding 
strategy of pattern A on G. Since DlC model is a proba¬ 
bilistic model, the objective function herein is the expected 
number of the final active nodes when there is no node 
can be further activated and no budget left. We denote 
the expected number of active nodes in G under a seeding 
strategy by E[S^]. 

Definition 2. Given a strategy on a DlC network G, if 
Si = % but there does not exist any edge (u, v) such that u 
is activated, either by its neighbors or as a seed node, in the 
{i — round, we say that waits for a mdl round. It can 
be easily seen that waiting for a null round has no impact 
on the process of spread or the effect of the strategy. Unless 
otherwise stated, we assume that any strategy will not wait 
for one or more null rounds. Therefore, we have at most N 
seeding steps and Si 7 ^ 0 for any strategy = (si,..., sjv)- 
For the convenience of analysis, we require that any strategy 

will not select an active node as a seed node. 

Two natural patterns Aq and A* are defined as follows. 

Definition 3. Let Aq = {ai, ...,aAr} where = 1 for 1 < 
i < B and = 0 for i > B. Informally, under pattern Aq 
we successively seed one node in each step until the budget 
is used up. 

Definition 4. Another pattern A* is adaptively constructed 
as follows. In pattern A*, we seed one node at a time and 
wait until no node can be further activated before seeding 
the next node. Thus, we seed one node in the first step and 
the rest of seeding pattern will be constructed adaptively. 

Note that given a pattern A there exists many strategies 
of A. We use OPT^ to denote the optimal adaptive strategy 
of pattern A on a given DlC network G with respect to the 
expected number of active nodes. 

The core problem considered in this paper is defined as 
follows. 

Problem 1. Adaptive Influence Maximization (AIM). Un¬ 
der the budget constraint, for any DlC network G, find a 
pattern A and a strategy of A on G such that E[S^] is 
maximized. 

2.3 An Example 

We employ the following example to illustrate the DlC 
model and the concept of seeding pattern. 

Example 1. Consider an example DlC network Gi = 
{V, E, Fy, Fe) with six nodes and five edges, as shown 
in Fig. 1^ where /^(l) = 0.5 for each v £ V, and 
De = {0.4,0.8} with /e(0.4) = 0.8 for each e £ E. In 
this example, each node can be activated with a probability 
of 0.5 when selected as a seed node, and the propagation 
probability between two connected nodes could be 0.4 or 
0.8 with probabilities 0.8 and 0.2, respectively. We set the 
budget B to be three. Suppose a certain seeding strategy 
produces a sequence of seed sets as ({ua}, {waj, 0 , {ui}) 
of pattern Ai = (l,l, 0 ,l).ln this concrete seeding process, 
seeds vq twice respectively in step 1 and 2 , which 
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V, Vl V3 V4 V5 Vs 

Fig. 1: Example network Gi. 

implies it fails to activate V3 in the first time. Such a strategy 
may depend on the outcomes of the past rounds or the 
propagation probability observed in each step. 

3 Greedy Algorithm 

In this section, we show the main result of this paper. The 
seed selection rule of the greedy algorithm is shown as 
follows. 

Rule 1. In each seeding step, we select the node that is 
able to maximize the marginal profit conditioned on the 
observed events. 

Note that in each step we can observe the followings: 
( 1 ) the outcome of the past rounds; ( 2 ) the propagation 
probabilities between the active nodes and their neighbors. 
We can see that Rule can be applied to any gattem. For 
a pattern A and a DIC network G, we use 5"^ to denote 
the seeding strategy following Rule|^ Our analysis consists 
of three steps. First, we propose a transformation approach 
which finds an explicit expression of the expected number 
of the active nodes. Then, we prove that A* is the optimal 
pattern for any DIC network G, i.e., for any pattern A , 
E[OPT^,\ > E[OPT^f\ . Finally, we show that is a 
(1 — l/e)-approximation under pattern A*, i.e., 

E[S%]>(l-l/e)-E[OPT%]. (1) 

3.1 Transformation 

In the classic IC model, a concrete network is a graph where 
each edge (u, v) is specified to be either live or not live. If 
edge (u, v) is live then it means u could successfully activate 
V. Informally speaking, all the uncertainties are determinate 
in a concrete network. In a concrete network, the active 
nodes are those which are connected to a seed node via a 
path of live edges, and the number of the active nodes in a 
concrete network is a submodular function over the seed set 

. Unfortunately, this approach cannot be directly applied 
to the analysis of our DIC model because several cases in 
the DIC model cannot be represented by a graph with a 
structure identical to that of the original DIC network. For 
example, how to represent the case that we seed a node 
more than once, and how to depict the feature that each 
propagation probability follows a distribution instead of 
being a single value? To address such scenarios, we transfer 
the original network to an auxiliary graph where the active 
nodes can be explicit observed given a seed set. 

Given a DIC network G = (V, E, Fy, Fe) where V = 
{ill, at}, we construct an auxiliary graph c-G = {Vc,Ec), 
as follows. Vc consists of N ■ B + N nodes and is partitioned 
into N +1 subsets denoted by IG (0 < f < N), where \ V^\ = 
N and |1G| = B {i > 0). Let = {uo,i,..., fo,Ai} and 
(* > 0). Nodes in are corresponding 



Fig. 2: Auxiliary graph c-Gi. In Example]^ we have a budget 
of three and the propagation probability of each edge in Gi 
follows a distribution on a domain of two values. Therefore, 
we have three nodes VT,i, VT ,2 and VT ,3 connected to Vb,i, and 
two edges e \^2 and e ?_2 connecting Vb,i and Vo, 2 . 

to the nodes in G and nodes in {i > 0) are used to 
represent the multiple seedings on Vi in G. Ec consists of 
two parts E^ and E'^, defined as follows. For i > 0 and 
f ^ j ^ B, we have an edge {vij,vo^i) for each pair of 
Vij and uo,i, and for each pair of nodes liq,* and vqj in Vb 
(1 ^ i ¥= j ^ we have edges denoted by 

(1 < k < connecting uo,i to vqj. Let E^ be the set 

of edges between and IG {i > 0 ) and E"^ be the set of 
edges within V^. Recall that ZI(„. is the domain of f{vi,vj) 
which is the distribution of the propagation probability of 
edge {vi,Vj) in G. 

The auxiliary graph c-Gi of Gi in Example is shown 
as Fig. Further explanations are presented in the caption. 

Now we show that given a seeding strategy how to 
observe the active nodes via c-G. Following the notations 
in | [T4| , we introduce the states of edges and the concept of 
realization. 

Definition 5. A full realization (/-realization) x of c-G is a 
mapping from edges in c-G to some states, where each edge 
in El is mapped to {live, not live} and each edge in E^ is 
mapped to {selected-live, selected-not live, not selected}. In an 
f-realization, only one edge from co,i to vqj can be mapped 
to selected-live or selected-not live. 

Definition 6 . A partial realization (p-realization) y of c-G is 
a mapping from edges to states, where each edge in El is 
mapped to {live, not live, undetermined}, and each edge in 
El is mapped to {selected live, selected-not live, not selected, 
selected-undetermined, undetermined}. In a p-realization, if one 
edge from vo^i to vqj is undetermined then all the edges from 
fo.i to vqj must be undetermined; if one edge from vo^i to 
vo,j is either selected-live, selected-live or selected-undetermined, 
then others edges from vo^i to vqj must be not selected. 

The explanations of the states are listed in Tables and 
Each edge together with its state in c-G corresponds to 
an event in the diffusion process of the original network G. 
We can see that an f-realization is a determinate case of the 
diffusion process and a p-realization is an intermediate state 
where the events are partially determined. For a seeding 
strategy S^, the seed nodes selected by are determined 
only if an f-realization is specified. We use to denote 
the sequence of seed sets selected by under f-realization 

X. 

For an f-realization x and a p-realization y, let Probfa;] 
(resp. Prob[y]) be the probability with which x (resp. y) 
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live 

Vi in G is successfully activated when selected as a seed node in the time. 

not live 

Wj in G fails to be activated when selected as a seed node in the time. 

undetermined 

The result of the j*'* seeding on Vi is unknown. 


TABLE 2: States of edge {vij,vo^i) in E^, for f > 0 and 1 < j < B. 


selected-live 

The propagation probability between Vi and Vj is ^^ and Vi activates Vj. 

selected-not live 

The propagation probability between Vi and Vj is ^ ^ and Vi fails to activate Vj. 

selected- 

undetermined 

The propagation probability between Vi and Vj is ^ and the result of the activation 

from Vi to Vj is unknown. 

not selected 

The propagation probability between Vi and Vj is not 

undetermined 

The propagation probability between Vi and Vj is unknown 


TABLE 3: States of ef ^ in E^, for l<f<A^, l<j<At and 1 < A: < |D(„. 


live or selected-live 



Eig. 3: An example f-realization xi of c-Gi. The number aligned with an edge is the propagation probability it stands for. 
In this concrete case, the seed nodes are vi and v^, and the active nodes in G are vi, V 3 , V 4 and U 5 


happens and Prob[a;|i/] be the probability that x happens 
conditioned on y. 

Definition 7. An f-realization x is compatible to a p- 
realization y if x can be obtained from y by changing 
the states of some edges in y from {undetermined, selected- 
undetermined} into {selected-live, selected-not live, not selected}. 


N^{.) has the following important properties. 

Property 1. If Vi C V 2 , then N^iVi) < N^{V 2 ). 

Property 2. Eor two node-subsets Vi and V 2 of Ui>o 
a node v € Ui>o where Vi C V 2 , u ^ V 2 , we have 

N}r{V2 u {u'}) - N}r{V2) < u {u'}) - n^{V4). 


Informally x is compatible to y implies cc is a possible 
successive state of y in the diffusion process. Similarly, 
we have the compatibility relationship between two p- 
realizations. Let e be the empty realization where all the 
edges are in the undetermined state. Eor a DIC network G, 
we denote the set of the f-realizations compatible to a p- 
realization y by C^{y). 

Eor each strategy = {si,...,sn) on G = 

{V, E, Fy, Fe), we have a corresponding seed set V C 
Ui>o K? iri c-G, constructed as follows. If in G is selected 
by for k times, then we add 1 ,..., Vi^k in c-G to F . By 
this setting, given an f-realization x of c-G, the number of 
active nodes under in G is the number of the nodes in 
17? that are connected to a node in V via live edges in c-G. 
In the sense of Example.]^ an example f-realization xi with 
strategy ({^ 3 }, {^ 3 }, 0, {ui}) is illustrate in Eig.|^ 

Eor an f-realization x, let Node{S^ ) be the union of the 
corresponding seed sets produced by in c-G in x. Eor a 
node-set V C Uj;>o K*' i^t (V ) be the number of active 
nodes in x with seed set V . Therefore, 

E[Sl]= E Frob[x]-N^iNodeiST)) (2) 

xec^{e) 


Proof. This proof is similar to that of Theorem 2.2 in |5). The 
only difference is that, in our case, the seed nodes and active 
nodes are constrained in Ui>o respectively. 

□ 


3.2 Optimal Pattern 

As introduced in Sec. 2.1 a seeding pattern identifies how 
many budgets should we consume in each step. Now, we 
show that A* is the optimal pattern. 


Lemma 1. For any DIC network G, suppose A is an arbitrary 
seeding pattern and S^, is a known seeding strategy of A on 
G . There exist a seeding strategy of A* on G such that 
E[S%] = E[S<f,]. 

Proof. The main idea is to construct a strategy according 
to S'^, such that, in any f-realization x, N^iNodeiSf)) = 
N^iNodeiS^:)). 

Let X be an arbitrary but unknown f-realization of c-G. 
Suppose = (si,...,sjv) and A = (oi,..., oat). Assume 
Si = {ui,!,..., Vi^ai} where the nodes are randomly ordered. 
Note that si is known before the process of spread and 
Si (i > 1 ) is unknown until step i as it depends on the 
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Step 


Diffusion process 
under S'®) 

Outcomes under 
cGi 


Diffusion process 
under Sj? 

Outcomes under 

cGi, 

0 ^. 

1 

seeds 

V3 


V 3 fails to be acti¬ 
vated 

seeds 

V 3 


V 3 fails to be acti¬ 
vated 

2 

seeds 

V3 


V 3 is activated 

seeds 

V 3 


V 3 is activated 

3 


V 3 activates V 4 . 

V 4 is activated 


V 3 activates V 4 

V 4 is activated 

4 

seeds 

Vl 

V 4 activates 

t>5 is activated; 
t>i is activated 


V 4 activates V 3 

V 3 is activated 

5 


vi activates V 2 ; 

W5 activates ve 

t >2 is activated; 
vq fails to be acti¬ 
vated 


V 3 activates vq 

vq fails to be acti¬ 
vated 

6 




seed V 4 


vi is activated 

7 





vi activates V 2 

V 2 is activated 


TABLE 4: Seeding processes of S^} and S^l . 


outcomes of the past rounds. Let Q be the sequence of the 
nodes in Usi, where the nodes are non-decreasingly ordered 
by the nodes index in Si according to the lexicographical 
order. Following pattern A*, let choose the node in Q in 
order. For the example shown in Fig.lslwith f-realization xi, 
the seeding process of strategy SA and its corresponding 

strategy Sy^l are shown in Table bJ 

One can see that S% does nothing but choose the nodes 
that are chosen by S'®/. Note that although S®/ is known 
to us, the seed nodes produced by S®, are undetermined as 
they depends on x. Suppose S®r selects Vi^ in the step, 
and the p-realizations in step i under S®/ and that under 
s®: in step I are yi and y 2 , respectively. To guarantee the 
feasibility of the construction of S®r, 1/2 must be compatible 
to yi, which means, in realization x, the events happening 
by step i under strategy S®,°’ is a subset of that of happening 
by step I under strategy . For otherwise, in step I, S'®, 
cannot determine which node Vij is. 

In fact, such feasibility can be guaranteed by pattern A*. 
Let Vi be the node in Q. Suppose S®,^ and S®r seeds Vi 
in step li and step I*, respectively. Let yi (resp. y*) be the 
p-realization under S®,”’ (resp. S®^) by step li (resp. I*). We 
need to prove that y* is compatible to y,, for any i > 1 . 
We prove it by induction. Clearly, y* is compatible to y-^ as 
Vi = Vi = Suppose y* is compatible to j/; for any i less 
than some k. Now we prove that yl is compatible to y^. 
For contraction, suppose yl is not compatible to y^,. By the 
supposition, there is an event in x that happens in y^ while 
has not happened in y^*^. However, yi^_^ is compatible to 
yj,_2, and, by pattern A* , there is no node can be further 
activated in realization x by step l^. under S'®,. This implies 
that S®, must wait for some null rounds between step 
and step If., which is a contradiction. 

By the construction of S®., since Node{S%) = 

Node{S^) in any f-realization x, we have E[S% ] = E[S^, ] 
according to Eq. §■ n 

One can see that any strategy of a pattern other than A* 
cannot always simulate the one of pattern A* by the similar 
construction due to the feasibility issue as discussed above. 


Intuitively, pattern A* is the optimal because it maximizes 
the information obtained before making seeding decision 
and brings us more options in selecting seed nodes. The 
above result is summarized as follows. 

Theorem 1. Pattern A* is the optimal pattern on any graph G, 
i.e.,for any pattern A, E[OPT^,] > E[OPT^]. 

Proof. By Lemma for any pattern A and network G, we 
always have some strategy S'®, of A* such that i?[S®,] > 
E[OPT^]. Thus, 

E[OPT^.] > E[S%] > E[OPT^]. 

□ 

3.3 Approximation Ratio 

_ Q 

In this section, we show that S a, has a approximation ratio 
of (1-1/e). 

The method to represent the random event space is 
critical to the analysis of a stochastic model. Essentially, the 
adaptive seeding strategy S^. forms a decision tree, where 
each node in the tree is a selected seed set and each out-edge 
of the tree-node represents a possible successive event. Let 
the root node be the first level. Then, each branch from level 
i to level i -I- 1 corresponds to a p-realization after round i 
under 5'^.. Each path from the root to a leave is formed 
by a sequence of p-realizations where each p-realization 
is compatible to its predecessor. For the decision tree of 
S'^., let Zi = {zl, be the set of the p-realizations 

(branches) from level i to level i + 1 where \Zi\ is number 
of branches, and Zq = {e}. Although the basic event space 
is unique, it can be represented via different decision trees 
under different strategies. For Example shown in Fig. 
the decision tree of a strategy of pattern A* on Gi is shown 
in Fig.j^where the explanations are available in the caption. 
Note that for a DlC network G the decision tree of 5^, is 
determinate. 

Now we are ready to show the main result of this paper. 
Our goal is to prove that 

E[OPT^,]<{l-l/e)-E[S%]. 
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r^(0,0,0,0,0,0)^{V3} 


^(0,0,1,0,0,0)-^{V4} 


{V3} 


(^( 0 , 0 , 0 , 0 , 0 , 0 )- 

ii( 0 , 0 ,l, 0 , 0 , 0 )- 

^( 0 , 0 , 1 , 1 , 0 , 0 )- 

i (0,0,1,0,0,0)- 

7 

~( 0 , 0 , 1 , 1 , 0 , 0 )' 
^( 0 , 0 , 1 , 1 , 1 , 0 ). 


li^( 0 , 0 ,l, 1 , 1 , 1 ). 

10 

^ 2 -( 0 , 0 , 1 , 1 , 0 , 0 ). 

.^( 0 , 0 , 1 , 1 , 0 , 0 ) |y^J^( 0 , 0 ,l,l,l, 0 )- 

s£( 0 , 0 ,l,l,l,l)- 

|£l( 0 , 0 ,l,l,l, 0 )- 

^(0,0,l,l,l,0)-^{Vi}-(l’0’l’l’l’0)- 

^( 1 , 1 , 1 , 1 , 1 , 0 )- 

,^( 0 , 0 , 1 , 1 , 1 , 1 ). 


.^(0,0,l,l,l,l)^{Vi} 


^( 1 , 0 , 1 , 1 , 1 ,!)- 

I5i(l,l,l,l,l,l)- 


•{V3} 

•{V4} 

•{Vi} 

•{Vi} 

■{Vi} 

•{V4} 

•{Vi} 

■{Vi} 

■{Vi} 

•{Vs} 

•{Vi} 

•{Vi} 

■{Vi} 

■{V2} 

•{Vs} 

•{Vs} 

■{V2} 

•{V 4 } 


Fig. 4: The decision tree of a strategy under pattern A* on the 
example DIC network Gi. For the vector (aii, 2 : 2 , 2 : 3 ,* 4 ,* 5 ) on 
a branch al, that Xi = 0 (resp. Xi = 1 ) means node Vi is active 
(resp. inactive) after round i through branch 2 !. In this example, 
branch z\ implies vs is not successfully acfivated in step 1 , and 
following pattern A* we have totally 5 and 18 branches from 
level 1 to level 2 and from level 2 to level 3, respectively. 


andFB{WB)=E[S%\ . 

By Rule[^ 

Wi+i = argmaxF)j^(lTi U {ti}). (5) 

V 

Let 

Aj = Ff{W, U {t/;,+i}) - FliWi), ( 6 ) 

for 0 < i < i? — 1 . 

Lemma 2. 

Fl{TB)<Fi{W^+B-/:^ 

Proof. For 1 < h < B,hy Property]^ 

N^inuw,) - N^in-iuw,) 

< N^i{tj,}UW,)-N^iW,) 

Thus, 

^ Prob[x| 2 ^] {N^in u W,) - U W,)) 

x&Cg{4) 

< Y. T^rob[x\zi]{Yi{tn}UW4-YiW4) 
x<^Cg{4) 

{ by Eq. } 

= Fl{{th}VJWi))-Fi{Wi) 

{ by Eq. } 

{ by Eq. } 

= A^ 

Adding the above inequalities for all 1 < ft, < B, we have 

^ ^ Proh[x\zi]{N°{TniJWf)-N°{Tn-iiJ 

l<h<B x^Ccizi) 

= Y ^roh[x\zi] {Y{Tb U 1R,) - Y{To U W4) 
xeCGizi) 

= F^iTBUWi)- F^iWi) 

< B-Aj. 

Thus, 

F^(Tb) < F^(Tb U W4 < Fi{Wi) + B-A,. (7) 


For an arbitrary network G, let ti be the 4^ seed node 
selected by OPT^,, and Ti = Similarly let Wi 

be the 4^ seed node selected by and Wi = {wi,..., Wi}. 
Set Tg = Wq — 0. We use the decision tree to analyze the 
seeding process. 

For a node set V and a p-realization zf , let 

F^(V')= Y Froh[x\zi]-Y{V') (3) 

xGCGizi) 

and 

\z,\ 

F4v') = YP^oh[zi]-F^{V'). (4) 

2 = 1 

One can see that Fl{Wi) is the expected number of active 
nodes under seed set Wi conditioned on p-realization zl 


Note that Tg depends on x and Wi depends on zl. 

□ 


A, = ^Prob[zf]-A^ ( 8 ) 

2 = 1 

Lemma 3. Fi{Wi) = Ag + ... + Ai_i. 

Proof. Note that, for any 0 < h < B 

\xh\ 

^Prob • 

2 = 1 
bh-il 

Y ^T^oh[zi_4-Fl_4Wh-iU{wh}). 

j^l 


Fh{Wh) 
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Thus, we have 

Aq + ... + Ai_i 

{ by Eq. | 8 | } 

\Zh\ 

h<i3—I 

{ by Eq. } 

\Zh\ 

= E E ■ {FliWh u {u;,+i}) - FliWn)) 

h<i 3 —I 

= Y.iFh+i{Wh+i) - Fh{Wh)) 

h<i 

hoi 

= i^.(TT;)-^Prob[z^].Fo^(VEo) 
i=i 

= F3W,) 

□ 

Finally, we have the following lemma. 

Lemma 4. E[OPT^4 < (1 - ^/e)E[S^4 

Proof. For 0 < i < B — 1, we have 

E[OPT^.] = FfiTe) = F^Tb). 


Therefore, the approximation ratio of is at least (1 — 
1 /e). □ 


The above result is summarized as follows. 

_ Q 

Theorem 2. is a strategy within a factor 1 — Ijefrom the 
optimal strategy of pattern A*. 




Since A* is the optimal pattern as discussed in Sec. 3.2 


is an (1 — l/e)-approximation of AIM problem. 


_ 

Corollary 1. S'^» is an [1 — 1 /e)-approximation of AIM prob¬ 
lem. 


Golovin et al. p5) apply the stochastic submodular max¬ 
imization technique to several applications including the 
influence diffusion in social networks. They conjecture that 
applying Rule.j^to pattern Aq in the classic IC model yields 
an (1 — l/e)-approximation to the optimal seeding strategy 
under pattern Aq. Actually the derivation of Theorem]^ can 
be applied to any pattern where we seed at most one node 
in each step in the DIG model. Therefore, since the classic 
IC model is a special case of the DIG model, the truth of 
their conjecture in p5| can be verified. In fact, under any 
pattern. Rule is able to provide an approximation with 
the same ratio. As this paper focuses on designing practical 
seeding strategies, we will not show the technical proof of 
that result. 


By Lemma 1^ 

Fl{TB)<Fi(W^pB-^, 

i.e., 

E[OPT%]<F3Wi) + B- IIx,, 

Thus, combining Lemma 

TljOPTi".] < Ao-f ... + A,_i+B.A,, (9) 

By multiplying the both sides of Eq. §by(l-l/i?)^-i- 
we have 

E[OPT%]-{l-l/B)^-^-^ (10) 

^ (Aq -f ... -1- Ai_i -1- B ■ Ai) • (1 — 1/B)^ ^ * 

Now we add up Eq. ( [To) for 0 < f < i? — 1. The left side of 
the summation is 


3.4 Implementation Issues 

To implement the proposed greedy algorithm, the only 
problem left is to calculate Eq. Unfortunately, as dis¬ 
cussed in |[^, it is #F-hard to calculate the real value of 
^xgCg(P) Prob[x|j/] • N^iV ) in Eq. ol. However, we can 
employ the Monte Carlo simulation to obtain an accurate 
estimation. By the Hoeffding's Inequality, the error of the 
estimation can be infinitely small when a sufficient num¬ 
ber of simulations are performed. Another issue one may 
concern is the efficiency of the greedy algorithm because a 
large number of simulation may required for an accurate 
estimation. As shown in fTs) , the Lazy-Eorward technique 
could be implemented in a hill-climbing strategy and leads 


B-l 

E[OPT^,] ■ (1 - 

i^O 

= Bil-{l-^)^)-E[OPT^,] ( 11 ) 

On the right side, the coefficient of A^ is 

B-{1- + E (1 - = B (12) 

j^i 

Thus, by Eqs. | (TT) and | (T^ , 

E[OPT%]-B ■ {I-{l-l/B)^) (13) 

< i? • (Aq -|- ... -I-Ab_i) 

{ by Lemma 1^} 

= B-Fb{Wb) 

= B-E[S%]. 


Algorithm 1 A-Greedy 

1: Input: G = (V, E, Fv,Fe) and budget B. 

2 : CurrentBudget -i— 0; A -i— 0; 

3: yo = II Vi is the p-realization after round i . 

4: for each ii in F do Su <—hoo; 

5: for i = 1 : A do 

6: if (CurrentBudget<B and no nodes can be further acti¬ 

vated) then 

7: for each d in F \ A do ^ false, 

8: while true do 

9: V* = argmax„g^\^ Sv 

10: if {sv* = true) then A -i— A U v*; break; 

11: else s„. = X;^gp^(^._^jFrob[a:|i/i-i]-N°(AUw*) 

12: CurrentBudget+CurrentBudget+1; 

13: Get yi) //wait for a roimd of spread 

14: y* yN 
15: Return (A) 
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to far fewer evaluations. The pseudo-code of 5'^. with Lazy- 
Forward method is shown in Algorithm We denote this 
adaptive seeding strategy by A-Greedy. 

4 Heuristic Seeding Strategy 

In this section, we present a heuristic adaptive seeding 
strategy based on the greedy algorithm in Sec.[^ To reduce 
the time consumed in the seeding process, a simple idea is to 
reduce the number of nodes that could be considered as seed 
nodes. Obviously, the performance of the seeding strategy 
cannot be guaranteed if we inappropriately exclude some 
nodes before the seeding process. Thus, we aim to study that 
what kinds of nodes can be ignored in the seeding process. 
An important observation as shown later in Sec. it that 
there could be a significant gap of the strength between the 
influential nodes and other nodes. This fact is coincident 
to the power-law nature of the real-world social networks 
where degree of the nodes follows the exponential distribu¬ 
tion. Motivated by this observation, we design a heuristic 
seeding strategy, termed as H-Greedy, that narrows the 
candidate seed set before the seeding process. 

H-Greedy. Let H{v) be the number of the nodes can 
be activated by a single seed node v. Let E[.] and Std[.] 
denote the mean and the standard deviation of a random 
variable. H-Greedy consists of two steps. First, before we 
start the seeding process, by Monti Carlo simulation, we 
first obtain the estimates of E[H{v)], fT’EuGU 
and Std[J2y^v H{v)/N]. We denote those three estimates 
by E[iJ(u)], ^'EvevH{v)/N], and StdE^g^ iL(u)/At], re¬ 
spectively. Then, when determining a seed node in the 
seeding process, we omit a node v if E[H{v)] is less than 
the lower 1-sigma controQof ^/^■ 

As discussed in the prior works, we used to execute 
Monte Carlo simulation for 10000 to 20000 times for an 
accurate estimation. However, in the first step of H-Greedy, 
1000 to 2000 simulations are sufficient. This is because the 
estimates are not necessary to be very accurate as they are 
merely used to narrow the candidate set of seed nodes. With 
a smaller set of candidate seed nodes the time consumed in 
the seeding process can be significantly reduced as about a 
half of the nodes will not be considered to be seed nodes. 
A shown later, the performance of H-Greedy is closed to 
Greedy which has a provable performance guarantee. We 
will further discuss the feasibility of H-Greedy in the next 
section. 

5 Experiment 

In this section, we show the results of the conducted exper¬ 
iments. In order to evaluate the proposed adaptive seeding 
strategies, we examine the performance of our strategies 
from the following aspects: (a) the influence spread compar¬ 
ing to non-adaptive seeding strategies; (b) the effectiveness 
and efficiency of the heuristic strategy. 

5.1 Experiment Setup 

In order to fairly compare the performance of our seeding 
strategies to that of the existing approaches, we employ 

2. Mean minus standard deviation 


9 

two real-world social networks, which have been widely 
used in the prior works, and a synthetic power-law network 
which is able to capture the key features of real social 
networks. The propagation probabilities are generated from 
three distributions, as shown later. 

Network structure. The first real-world social network, 
denoted by Hep, is an academic collaboration from co¬ 
authorships in physics. Hep is compiled from the "High 
Energy Physics - Theory" section of the e-print arXi\|u and 
has been widely used in the prior works (e.g. |^, j8|7 ( TO) 
and (T^). Eor each pair of authors who has a co-authorship, 
we have two directed edges from each one to the other. 
The resulting network has about 15,000 nodes and 58,000 
directed edges. The second dataset, denoted by Wiki, con¬ 
tains the Wikipedia voting data from the inception of 
Wikipedia. Nodes in this network represent Wikipedia users 
and a directed edge from node u to node v represents that 
user u votes on user v, which mean v has influence over u. 
Thus, if there is an edge from it to w in the original data, 
we add an edge from u to it in Wiki. Wiki has about 8,600 
nodes and 103,000 directed edges and has been studied in 
(^ , and p3) . The last dataset is a synthetic power- 
law network generated by p4) . The synthetic power-law 
network selected in this paper, denoted by PL, includes 
2500 nodes and 26,000 directed edges. Power-law degree 
distribution has been shown to be one of the most important 
characteristics of social networks GZI- We use PL dataset to 
evaluate the performance of the proposed seeding strategies 
in general social networks. 

Propagation probability. The three distributions = 
1, 2, 3) of the propagation probability Xe of an edge e are 
shown as follows. In the propagation probability are 
fixed as 0.01, which is the same as that in 1^. is an 
exponential distributions with a mean of 0.01. is a 
uniform discrete distribution over {0.1, 0, 01,0, 001}. 

Activation probability. We assign a uniform activation 
probability on each node u, choosing Prob[Au = 1] to be 1 
and 0.5. 

Note that it reduces to the classic IC model if and 
Xu = l. 

Seeding strategies. The tested seeding strategies are 
shown as follows. 

1) Greedy. This is the state-of-art non-adaptive seed¬ 
ing approach proposed in j5|. In Greedy, the nodes 
are selected by a hill-climbing algorithm before the 
diffusion process. When implementing Greedy in 
the Die model, we fixed the propagation probability 
by its mean as the real propagation probabilities are 
unavailable in the DIC model before the start of 
diffusion process. Eor each estimation, 10000 sim¬ 
ulations are run to obtain an accurate estimate. 

2) A-Greedy. This is the greedy adaptive seeding 
strategy proposed in Sec. Similarly, 10000 sim¬ 
ulations are run to obtain an accurate estimate of 

Prob[a;|i/*_i] • N®(A U u*) in line 11 of 

Algorithm 

3) H-Greedy. This is the heuristic adaptive seeding 
strategy proposed in Sec. In the first step of 

3. http://www.arXiv.org 
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(a) with Prob[X„ = L] = L (b) with Prob[Jf„ = 1] = 1 (c) with Prob[Tfu = L] = 0.5 

on Hep on Hep on Hep 



(d) with Frob[Xu = 1] = 0.5 
on Hep 




(e) with Prob[Xu = 1] = L on PL (f) with Prob[X„ = 1] = 0.5 on PL 



(g) with Prob[X„ = 1] = L on Wiki 


Fig. 5: Comparing A-Greedy with Greedy. In all seven graphs, the y-axis and x-axis denote the number of active nodes and the 
budget, respectively. Each graph gives four curves plotting the influence spread under four seeding strategies, respectively. 


H-Greedy, 2000 simulations are run to obtain the 
estimates mentioned in Sec. H] 

4) Random. This is a baseline seeding strategy where 
the seed nodes are selected randomly. 

As discussed in the prior works, the seeding strategies based 
on the shortest-path and high-degree perform worst than 
Greedy. Thus we ignore other measures. In our experiment, 
the budget is chosen from 10 to 30. 


5.2 Results 

First, we discuss the performance of A-Greedy. As shown 
in Fig. A-Greedy outperforms Greedy under all circum¬ 
stances. This is intuitive as the adaptive seeding strategies 
are able to utilize the outcomes of the past rounds. As 
shown in Fig. l5a] A-Greedy is superior to Greedy by a 
notable margin even in the classic IC model. For the DIG 
model where the diffusion process is of more uncertainness, 
the results herein verify the significant advantages of the 
adaptive seeding strategy over the non-adaptive seeding 
strategy. We discuss the results in detail in the following. 

For the Hep network, as shown in Fig. A-Greedy 
is 125% better than Greedy in the classic IC model under 
with YrohyXu = 1] = 1. While the uncertainness of 
the diffusion process getting increased, namely by chang¬ 
ing Prob[W„ = 1] to 0.5 as shown in Fig. Im A-Greedy 
becomes 320% better than Greedy. As shown in Figs. 

[5f| and 5g for PL and Wiki network, we have the similar 
result. For example, for the PL network under with 
Prob[Art, = 1] = 0.5, one seed node results about 2.5 active 
nodes under A-Greedy while in average 1.67 nodes can be 
activated by a single seed node under Greedy. Another im¬ 
portant observation is that the curves generated by Greedy 


become less stable in the DIG model, which implies that 
to reach the same level of accuracy Greedy requires more 
number of simulations than A-Greedy does. 


Now let us discuss the performance of the proposed 
heuristic seeding strategy H-Greedy. Fig.j^shows the distri¬ 
bution of E[H(yy\ drew from the dataset by simulation. In 
Fig.|6a| 90 % of the nodes cannot activate more than 2 nodes, 
while in Figs.|6b|and|6c] we can see that there is a significant 
gap between the strength of influential nodes and that of 
other nodes. For example, as shown in Fig. l6b] 24 percent of 
the nodes in Wiki can activate more than 1600 nodes while 
82 percent of them can hardly activate more than 50 nodes. 
For PL dataset in the same setting, about 30 percent of the 
nodes could bring 780 active nodes while 68 percent of them 
only results less than 100 active nodes. Admitting that the 
difference of E\H{v)] between two nodes would decrease 
along with the seeding process due to the submodularity, 
the nodes with small E\H{v)] are not likely to be a seed 
node as the gap is too large and we only have a small 
budget compared to the population of users. Thus, 1-sigma 
control on E\H{v)] is a safe bound such that we will not 
miss any influential nodes. As shown in Fig. under all 
the circumstances the performance of H-Greedy is almost 
the same as that of A-Greedy. This is because in those 
settings H-Greedy can hardly eliminate any nodes as the 
distributions of E[El{v)] are like Fig.|^ Thus, H-Greedy is 
identical to A-Greedy in those cases. However, for the cases 
where the distribution of E[H{v)\ has a pattern like Figs[^ 


or 


6 c H-Greedy would be an effective and efficient strategy. 


In these cases, H-Greedy could rule out more than a half 
of the nodes from the candidate seed nodes and thus more 
than 20% time consumed in the seeding process could be 
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-Hep network 

I -PL network I 

1 - Wiki network! 


■ 0 200 400 600 800 1000 1200 1400 1600 1800 

x: the number of nodes activated by a single node. 


I-Hep network 

I -PL network 

I -Wiki network 


0 100 200 300 400 500 600 700 800 900 

x; the number of nodes activated by a single node. 


(a) with Prob[X„ = L] = 1 


(b) with Prob[X„ = L] = 1 


(c) with Prob[X„ = 1] = L 


Fig. 6: Distributions of E{H{v)) of the three datasets under different propagation probability. 




Fig. 7: Comparing H-Greedy with A-Greedy. The y-axis and x-axis denote the number of active nodes and the budget, respectively. 
Each graph gives three curves plotting the influence spread under A-Greedy H-Greedy and Greedy respectively. We ignore 
Random here as it performs poorly 


Parameter Setting 


H-Greedy 

(ms) 

A-Greedy 

(ms) 

& Prob[X„ = 1] = 

1 on PL 

14977 

51485 

& Prob[X„ = 1] = 

1 on Wiki 

87412 

268499 

& Prob[X„ = 1] = 

1 on PL 

981 

11931 

& Prob[A„ = 1] = 

1 on Wiki 

31247 

44625 


TABLE 5: Scalability of H-Greedy. The four cases are shown 
in the first column. The second and third column shows the 
average time consumed in selecting one seed node under 
H-Greedy and A-Greedy. 


saved as shown in Fig. ??. Furthermore, H-Greedy performs 
slightly worse than A-Greedy but still better than Greedy, as 
shown in Fig. [7a] and 

6 Conclusion and Future work 

In this paper we have considered the problem that how 
to maximize the spread of influence in dynamic social net¬ 
works. The proposed DIG model is able to capture the dy¬ 
namic aspects of a real social network and the uncertainness 
of the diffusion process. In the DIG model, a certain node 


7b 


can be seeded for more than one time and the propagation 
probability between two users varies following a certain 
distribution. Based on the DIG model, we formulate the 
adaptive seeding strategies by introducing the concept of 
seeding pattern. The pattern A* constructed in Sec.j^shows 
the optimal method to determining how many budgets shall 
we utilize in each seeding step. Combining the optimal 
pattern with the natural hill-climbing algorithm, we present 
the A-Greedy seeding strategy and show that A-Greedy has 
a performance ratio of (1 — 1/e). By the observation that the 
influential nodes are much more powerful than other nodes 
in a social network, we further design an simple heuristic 
adaptive seeding strategy H-Greedy based on A-Greedy. 
The experimental results herein demonstrate the superiority 
of the adaptive seeding strategies to prior approaches. 

The future work of this topic consists of several aspects. 
As we can see, H-Greedy is a simple heuristic strategy 
and it is not effective for all the settings of DIG model. 
Thus, we plan to design better heuristic adaptive seeding 
strategies that are able to deal with general social networks. 
We note that the technique in 0 is possibly applicable to 
the adaptive seeding framework and we leave this part 
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as future work. Another aspect of the future work is to 
design adaptive seeding strategies which are able to meet 
the round limit. In real applications, we may only care about 
the spread influence within a certain number of rounds. 
In this case, the analysis of the adaptive seeding strategies 
becomes intricate. On the one hand as shown by pattern A* 
we try to utilize the budgets as late as possible in order to 
obtain more information while on the other hand delaying 
a seeding step leads us to lost a diffusion round as we have 
round limit. One can easily check that with a round limit 
our objective function is not submodular anymore, which 
renders it more hard to find a greedy algorithm with a 
provable performance guarantee. 
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